A Deterministic Method for Structural Analysis of Compound Words in Japanese

نویسندگان

  • Dongli Han
  • Takeshi Ito
  • Teiji Furugori
چکیده

Structural analysis of compound words is necessary and an important process in natural language processing. Proposed here is a corpusand statisticsbased method for the structural analysis of compound words in Japanese. We determine the structure of a compound word by using Internet corpus and calculating the strength of word association among its constituent words. Experiments with 5, 6, 7, and 8 kanji compound words show that our method works well and its performance is better than those of other comparable studies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Japanese Compound Words and Their Pronunciations from Web Pages and Tweets

Mining compound words and their pronunciations is essential for Japanese input method editors (IMEs). We propose to use a chunk-based dependency parser to mine new words, collocations and predicate-argument phrases from largescale Japanese Web pages and tweets. The pronunciations of the compound words are automatically rewritten by a statistical machine translation (SMT) model. Experiments on a...

متن کامل

Composition and Decomposition of Japanese Katakana and Kanji Morphemes for Decision Rule Induction from Patent Documents

We propose a new method to construct a word list for rule induction from Japanese patent documents. For word segmentation in Japanese, statistical morphological analyzers have been used in many applications. However, the output of these morphological analyzers presents defects when analyzing unknown words, specifically words that contain Kanji/Katakana morphemes. Some words are overly segmented...

متن کامل

An Analysis of Persian‌ Compound Nouns as Constructions

In Construction Morphology (CM), a compound is treated as a construction at the word level with a systematic correlation between its form and meaning, in the sense that any change in the form is accompanied by a change in the meaning. Compound words are coined by compounding templates which are called abstract schemas in CM. These abstract constructional schemas generalize over sets of existing...

متن کامل

Word-Forming Process in Azeri Turkish Language

The subject intended to study the general methods of natural word-forming in Azeri Turkish language. This study aimed to reach this purpose by analyzing the construction of compound Azeri Turkish words. Same’ei (2016) did a comprehensive study on word-forming process in Farsi, which was the inspiration source of this study for Azeri Turkish language word-forming. Numerous scholars had done vari...

متن کامل

AN ADAPTIVE IMPORTANCE SAMPLING-BASED ALGORITHM USING THE FIRST-ORDER METHOD FOR STRUCTURAL RELIABILITY

Monte Carlo simulation (MCS) is a useful tool for computation of probability of failure in reliability analysis. However, the large number of samples, often required for acceptable accuracy, makes it time-consuming. Importance sampling is a method on the basis of MCS which has been proposed to reduce the computational time of MCS. In this paper, a new adaptive importance sampling-based algorith...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002